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LOCALLY ADAPTIVE ESTIMATION OF EVOLUTIONARY 
WAVELET SPECTRA 1 
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We introduce a wavelet-based model of local stationarity. This 
model enlarges the class of locally stationary wavelet processes and 
contains processes whose spectral density function may change very 
suddenly in time. A notion of time-varying wavelet spectrum is uniquely 
defined as a wavelet-type transform of the autocovariance function 
with respect to so-called autocorrelation wavelets. This leads to a nat- 
ural representation of the autocovariance which is localized on scales. 
We propose a pointwise adaptive estimator of the time-varying spec- 
trum. The behavior of the estimator studied in homogeneous and 
inhomogeneous regions of the wavelet spectrum. 

1. Introduction. The spectral analysis of time series is a large field of 
great interest from both theoretical and practical viewpoints. The funda- 
mental starting point for this analysis is the Cramer representation, stating 
that all zero-mean second order stationary processes Xt, t £ Z, may be writ- 
ten 

(1.1) X t = I A(u)exp(iujt)dZ(u;), teZ, 

J [— 7T",7r) 

where A(u>) is the amplitude of the process Xt and dZ(uj) is an orthonor- 
mal increment process, that is, ~E(dZ(u) dZ(/j,)) = dwSo(u — /x); see Brillinger 
(1975). Correspondingly, under mild conditions, the autocovariance function 
can be expressed as 

cx(t)= fx(u)exp(iuT)duj, 
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where fx is the spectral density of Xt- 

There is not a unique way to relax the assumption of stationarity, that is, 
to define a second order process with a time-dependent spectrum. However, 
this modeling is a theoretical challenge which may be helpful in practice 
since many studies have shown that models with evolutionary spectra or 
time-varying parameters are necessary to explain some observed data, even 
over short periods of time. Examples may be found in numerous fields, 
such as economics [Swanson and White (1997), Los (2000)], biostatistics 
[Ombao et al. (2002)] and meteorology [Nason and Sapatinas (2002)], to 
name but a few. 

Among the different possibilities for modeling nonstationary second order 
processes, we emphasize the approaches consisting of a modification of the 
Cramer representation (1.1). Different modifications of (1.1) are possible. 
First, we can replace the process dZ(uj) by a nonorthonormal process, lead- 
ing to, for instance, the harmonizable processes [Lii and Rosenblatt (2002)]. 
A second possibility is to replace the amplitude function A{oo) by a time- 
varying version At(oS) and to assume a slow change of At(iv) over time. Such 
an approach is followed to define oscillatory processes [Priestley (1965)]. 
However, a major statistical drawback of the oscillatory processes is the 
intrinsic impossibility of constructing an asymptotic theory for consistency 
and inference. To overcome this problem, Dahlhaus (1997) introduced the 
class of locally stationary processes, in which the transfer function is rescaled 
in time. In this approach, a doubly-indexed process is defined as 

(1.2) x t,T = J^ A\^,ujexp(iut)dZ(u), t = 0, . . . ,T - 1,T > 0, 

where the transfer function A(z, u) is defined on (0,1) x [— tt,tv). Dahlhaus 
(1997, 2000) investigated statistical inference for such processes, with a dis- 
cussion on maximum likelihood, Whittle and least squares estimates, and 
showed that asymptotic results when T tends to infinity can be considered. 
However, in this setting, letting T tend to infinity does not have the usual 
meaning of "looking into the future," but means that we have, in the sam- 
ple Xo t T, ■ ■ ■ ,Xt-i,t, more information about the local structure of A(z,u). 
This formalism is analogous to nonparametric regression, for which "asymp- 
totic" means an ideal knowledge about the local structure of the underlying 
curve. 

In this article, we focus on a class of doubly-indexed locally stationary 
processes defined by replacing the harmonic system {exp(iu;t)} in (1.2) by 
a wavelet system. In this way, we move from a time-frequency representa- 
tion to a time-scale representation of the nonstationary process. Because 
wavelets systems are well localized in time and frequency, they appear more 
natural for modeling the time-varying spectra of nonstationary processes. 
As wavelets decompose the frequency domain into discrete scales, they offer 
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a well-adapted system to achieve the trade-off resolution between time and 
frequency [Vidakovic (1999)]. 

The class of locally stationary wavelet processes studied in this article 
was initially introduced by Nason, von Sachs and Kroisandt (2000). Their 
definition of wavelet processes involves a time-varying amplitude which is 
smoothly varying and continuous as a function of time. An initial goal of 
this article is to extend this definition to the case of time-varying ampli- 
tudes with possibly discontinuous behavior in time. This introduces some 
technical difficulties to the proof of our results, but we believe the gain due 
to this extension to be crucial. Our new definition now includes more im- 
portant examples of nonstationary processes. For instance, this extension of 
the definition is needed if we wish to model a nonstationary process built 
as a concatenation of different processes, such as the Haar processes defined 
in Nason et al. (2000). Moreover, wavelet processes can now be used for the 
analysis of intermittent phenomena, such as transients followed by regions 
of smooth behavior. 

Our definition of wavelet processes is presented in Section 2, where we 
also define their evolutionary spectrum. This spectrum is a function of time 
and scales, and measures the power of the process at a particular time and 
scale. The main goal of the present article is to provide a pointwise adaptive 
estimation of the evolutionary spectrum. The estimation procedure follows 
the local adaptive method of Lepski (1990). The main difference with the 
latter is that we are now estimating a spectral density function, that is, the 
second order structure of correlated observations. Moreover, our statistical 
model is allowed to be nonstationary and the behavior of its evolutionary 
spectrum may be very inhomogeneous in time. 

In Section 3, we present a preliminary estimator of the evolutionary spec- 
trum and derive some useful properties that are needed in order to de- 
rive the adaptive estimator in Section 4. The behavior of this estimator 
is discussed for the two cases where the evolutionary wavelet spectrum is 
either regular or irregular near the point of estimation. These results ex- 
plain the good performance of the algorithm in practice. Section 5 con- 
cludes with the result of a brief simulation study. All details and specific 
questions related to the practical implementation of our procedure have 
been considered in a separate paper [Van Bellegem and von Sachs (2004)], 
where a more exhaustive study of simulations and a real data analysis are 
provided. 

Proofs and technical derivations are deferred to the appendices. Our es- 
timator takes the form of a quadratic form of the increments, which 
sumed to be Gaussian. Our estimator is the sum of a quadratic form of the in- 
crements that are assumed to be Gaussian and an additive, independent lin- 
ear form of Gaussian variables. Thus, the main technical goal is to study the 
behavior of the (quadratic + linear) form of Gaussian variables. There exists 
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a large body of results on quadratic forms of Gaussian variables. Recent de- 
velopments include Rudzkis (1978), Neumann (1996), Laurent and Massart 
(2000), Spokoiny (2001), Comte (2001) and Dahlhaus and Polonik (2002). 
The exponential inequality proved in the latter reference is the starting point 
for some important results in the present article. On the other hand, in the 
appendices, we also present some original results on quadratic forms that 
are needed to prove our results. 

2. Locally stationary wavelet processes. The wavelet system used to 
build locally stationary processes is a nondecimated system of compactly 
supported and discrete wavelets. We first briefly recall some points about 
this system of wavelets and then give a definition of the wavelet processes 
and wavelet spectra. 

2.1. Discrete nondecimated wavelet system. The local functions used in 
the representation of LSW processes are a set of discrete nondecimated 
wavelets {ipjk,j = — 1, —2, ...;k& Z}. We refer to Vidakovic (1999) for a re- 
view of wavelet theory and its applications in statistics, and to 
Nason and Silverman (1995) for a detailed introduction to the nondecimated 
wavelet transform. Let us simply recall that, in contrast to the discrete 
wavelet transform, the discrete nondecimated wavelets at all scales j < 
can be shifted to any location defined by the finest resolution scale, de- 
termined by the observed data. As a consequence, this construction leads 
to an overcomplete system of the space of square summable sequences, 
-£ 2 (Z). The wavelets considered in this article are assumed to be compactly 
supported in time and we will denote by Cj the length of the support of 
ipjo, that is, Cj := | supp^jol- This automatically implies | supp^fcl = Cj = 
(2~ J — l)(£_i — 1) + 1 for all j < 0. Also, observe that, as in Nason et al. 
(2000), we departed from the usual wavelet numbering scheme. The data 
inhabit scale zero, and scale —1 is the scale which contains the finest res- 
olution wavelet detail. Then, the support of the wavelet on the finest scale 
remains constant with respect to T. 

For ease of presentation, recall the simplest discrete nondecimated system, 
called the Haar system, given by 

i> jk = 2 J ' /2 I {0 ,i,...,2-,-i-i}(fc) - 2 i/2 I {2 - J -i i ..., 2 -,_ 1} (A0 

for j = —1, —2, . . . and k £ Z, 

where I^(i) is 1 if t £ A and otherwise. The shifted version of ipjt is given 
by ipjk(t) = ipj,k-t for all k £ Z. 
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2.2. The process and its evolutionary wavelet spectrum. As we will note 
below, our definition of locally stationary wavelet processes differs from the 
original definition of Nason et al. (2000) as we only impose a total varia- 
tion condition on the amplitudes instead of a Lipschitz condition. See also 
Fryzlewicz and Nason (2006) for a discussion of that definition. 

Definition 1 . A sequence of doubly-indexed stochastic processes Xt t T 
(t = 0, . . . , T — 1, T > 0) with mean zero is in the class of locally stationary 
wavelet processes (LSW processes) if there exists a representation 

-1 T-l 

(2-1) Xt,T= J2 w jk;T?Pjk{t)£jk, 

j=—oo fc=0 

where the infinite sum is to be understood in the mean-square sense, {ipjk{t) — 
tpj,k-t}jk with j < is a discrete nondecimated family of wavelets based on 
a mother wavelet ip(t) of compact support, and such that the following con- 
ditions are satisfied: 

1. £jk is a random orthonormal increment sequence such that E£jfc = and 
Cov(£jfc,& m ) = 5je5km for all j,£,k,m, where Sji = 1 if j = 1 and else- 
where. 

2. For each j < —1, there exists a function Wj(z) on (0,1) possessing the 
following properties: 

(a) Eji-oo \Wj (z)\ 2 <C <oo uniformly in z G (0, 1); 

(b) there exists a sequence of constants Cj such that for each T 



(2.2) sup 

fc=o,...,r-i 



< -2-- 

~ T ' 



Wjk-T ~ Wj | 

(c) the total variation of Wf(z) is bounded by Lj, that is, 



T\{Wf) := sup(]T \Wj[fH) - WSWi)| : < a < 

U=i 



< aj < 1, 



(2.3) JeN 

(d) the constants Cj and Lj are such that 

-l 

(2.4) Cj(CjLj + Cj)<p <oo, 

j=-oo 

where Cj = \ supp^jol = (2 _ - J — l)(£_i — 1) + 1. 
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LSW processes use wavelets to decompose a stochastic process with re- 
spect to an orthogonal increment process in the time-scale plane. Due to 
the overcompleteness of the nondecimated system, a given LSW processes 
does not determine the sequence {wj^-t} uniquely. However, we can build a 
theory which ensures the existence of a unique wavelet spectrum (in a sense 
defined after Proposition 1 below). This property is a consequence of the 
local stationarity setting which introduces a rescaled time z = t/T € (0, 1) 
on which Wj(z) is defined. The rescaled time permits increasing amounts of 
data about the local structure of Wj(z) to be collected as the observed time 
T tends to infinity. Even though a given LSW process does not determine 
the sequence {wjk-.r} uniquely, the model allows the (asymptotic) identifica- 
tion of the model coefficients determined by uniquely defined Wj{z). Then, 
the evolutionary wavelet spectrum of an LSW process {_Xt,T\t=Q,...,T—\i with 
respect to ip, is given by 

(2.5) S J (z) = \W j (z)\ 2 , « €(0,1), 

and is such that, by definition of the process, J2j-L-oo ^j( z ) < 00 uniformly 
in ze (0,1). 

The evolutionary wavelet spectrum Sj(z) is related to the time-dependent 
autocorrelation function of the LSW process. Observe that the autocovari- 
ance function of an LSW process can be written as 

c x ,t(z,t) = Cov(X [zT]:T ,X [zT]+TiT ) 

for z G (0, 1) and r in Z, and where [•] denotes the integer part of a real 
number. The next result shows that this autocovariance converges asymp- 
totically to a local autocovariance defined by 

-l 

(2.6) c x (z,t)= SjWVjir), 

j=-oo 

where ^j(r) = J2kL-oo ^jfc(0)V'ifc( T ) is the autocorrelation wavelet function. 
Proposition 1. Under the assumptions of Definition 1, ifT — ► oo, then 

OO „\ 

£ / dz\c x , T (z,T)-c x (z,T)\=0(T- 1 ) 

for all LSW process. 

Appendix A presents some properties of the autocorrelation wavelet sys- 
tem appearing in (2.6). Like wavelets themselves, this system enjoys good 
localization properties. Consequently, we observe that equation (2.6) is a 
multiscale decomposition of the autocovariance structure of the process over 
time: the larger the wavelet spectrum Sj(z) is at a particular scale j and 
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point z in the rescaled time, the more dominant is the contribution of scale j 
in the variance at time z. Thus, the evolutionary wavelet spectrum describes 
the distribution of the (co)variance at a particular scale and time location. 

Moreover, we recall in Appendix A that {^j} is a linearly independent 
system. Therefore, since the autocovariance function converges to the local 
autocovariance in the sense of Proposition 1, the coefficients Sj(z) in (2.6) 
are asymptotically the unique wavelet representation of the second order 
structure of the time series. 

It is worth mentioning that a stationary process with an absolutely summable 
autocovariance function is an LSW process [Nason et al. (2000), Proposition 
3]. Stationarity is characterized by a wavelet spectrum which is constant over 
time: Sj(z) = Sj for all z G (0, 1). However, our motivation for studying LSW 
processes lies in the modeling of time- varying spectra. The regularity of the 
wavelet spectrum in time is determined by the smoothness of Wj(z) with 
respect to z. In Nason et al. (2000), this function is assumed to be Lipschitz 
continuous in time. In our definition of LSW processes, we only require the 
total variation of Wj to be bounded. This weaker assumption is considered 
not only in order to work with less strict assumptions, but also to allow a 
discontinuous evolution of the wavelet spectrum in time. Figure 1 shows a 
simulated example of such a nonstationary process. 

3. A first estimator of the wavelet spectrum. 

3.1. The corrected wavelet periodogram. An estimator of the wavelet 
spectrum is constructed by taking the squared empirical coefficients from 
the nondecimated transform: 



Ij-T(z) is called the wavelet periodogram, as it is analogous to the formula 
for the classical periodogram in traditional Fourier spectral analysis of sta- 
tionary processes [Brillinger (1975)]. 

Some asymptotic properties of this estimator have been studied by 
Nason et al. (2000), who showed that the wavelet periodogram is not an 
asymptoticaly unbiased estimator of the wavelet spectrum. Indeed, Propo- 
sition 4 of Nason et al. (2000) states that for all fixed scales j < 0, 




j = -l,...,-log 2 T;/c = 0,...,r-l. 



-i 






l=-\og 2 T 



uniformly in z £ (0, 1), where the matrix A = {Aji)j ^<o is defined by 
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Fig. 1. The upper figure is an example of theoretical spectrum Sj(z). This spectrum is 
used in the lower figure to simulate a locally stationary wavelet process of length T = 1000. 
This simulation uses Gaussian innovations and nondecimated Haar wavelets. 

Note that the matrix Aji is not simply diagonal since the autocorrelation 
wavelet system {^j} is not orthogonal. Nason et al. (2000) proved the in- 
vertibility of A if {^j} is constructed using Haar wavelets. If other compactly 
supported wavelets are used, numerical results suggest that the invertibility 
of A still holds, but a complete proof of this result has not yet been estab- 
lished. As we need the invertibility of A in results which follow, we hereafter 
restrict ourselves to Haar wavelets, but conjecture that all results remain 
valid for more general Daubechies wavelets [Daubechies (1992)]. 

Equation (3.1) motivates the definition of a corrected wavelet periodogram, 

(3-2) L rA^)= E (At)-, 1 (j2 X t ^ £k (t) 

^ ' £=-log 2 T \t=0 

where At = ( j 4j^)-iog 2 T<j,£<-i- The corrected wavelet periodogram Lj-T is 
a preliminary tool for constructing an asymptotically consistent estimator 
of the evolutionary wavelet spectrum. To this end, it needs to be smoothed 
in time. This question is addressed in the following. 




Remark 1 . The asymptotic bias of the wavelet periodogram is a conse- 
quence of the overcompleteness of the nondecimated wavelet system {ipjk}- 
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One could ask if it would not be easier to define LSW processes using a 
decimated wavelet system because, for this system, the matrix A reduces to 
the identity. Unfortunately, the answer is negative: the use of nondecimated 
wavelets, as described in von Sachs et al. (1997), would not allow the local 
autocovariance function to be written as a wavelet-type transform of an evo- 
lutionary spectrum, as in (2.6). Moreover, classical stationary processes are 
not included in the model based on decimated wavelets. 

3.2. The preliminary estimator and its properties. Suppose we want to 
estimate Sj(zo) from observations X_ T = (Xqj-, ■ ■ • ,Xt-i,t)'- The estimator 
studied below takes the following form: 



where Zj^-t are i-i-d. Gaussian random variables of mean zero and variance 
C 2 2- ? , independent from X T for a given constant C 2 , 1Z is an interval in (0, 1) 
that contains the point zq and k £ 1ZT means that k/T £ 1Z. The estimator 
(3.3) is essentially the average of the corrected wavelet periodogram over the 
interval 1Z. The reason for adding a "noise process" Zj^T in our estimator is 
for the sake of regularization, since the process X T is not guaranteed to be 
invertible. In other words, the presence of the additive Gaussian variable in 
the estimator Qj^i-T allows consistent estimation of more general processes 
for which the wavelet spectrum Sj(z) is not bounded away from zero. Note 
that this regularization technique does not add any systematic bias to the 
resulting estimator since in (3.3), an average is taken over the zero-mean 
Gaussian variables ZjfcT- That procedure is analogous to the regularization 
techniques for ill-posed inverse problems such as, for instance, in ridge re- 
gression or Tikhonov regularization; see also Neumann (1996) for a similar 
technique in the context of stationary time series. 

Of course, the choice of the interval 1Z around z$ is crucial in this esti- 
mation. This question will be addressed in the next section. First, we derive 
some useful properties of Qjji-T as an estimator of 



The statistical properties of Qj,n-,T are now derived under a set of assump- 
tions. 

Assumption 1. The autocovariance function cx,t and the local auto- 
covariance function cx of the LSW process are such that 



(3.3) 





(3.4) 




(3.5) 




is bounded independently of T 
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and 



oo 



(3.6) 



CX || l,oo : = 



sup \c x (z,r)\ < 00. 

r =-oo z G(0,l) 



This assumption is needed to control the spectral norm of the covariance 
matrix of the process (Lemma B.3 in Appendix B). For a stationary pro- 
cess, it reduces to absolute summability of the autocovariance of the process 
(short memory property). 

Assumption 2. There exists an e > such that for all z £ (0,1), 



According to equation (2.6), the sum over scales of Sj(z) is the local 
variance of the process at time [zT] and this assumption states that the 
local variance of the process is bounded away from zero. 

Assumption 3. The increment process {£jfc} in Definition 1 is Gaussian. 

This assumption allows substantial simplifications in the proofs. It is also 
assumed to establish some results in Nason et al. (2000) and Fryzlewicz et al. 



Assumption 4. The evolutionary wavelet spectrum Sj(z) is such that 



In the definition of the corrected wavelet periodogram (3.2), all scales 
> j > — 00 are implicitly included due to the definition of X^t- The last 
assumption is used in order to control the remainder of the estimation bias 
at all scales lower than — log 2 T. 

The following proposition describes the asymptotic properties of Qj t n-T- 

Proposition 2. Suppose Assumptions 1-4 hold true. For all LSW pro- 
cesses (Definition 1) and all 1Z C (0, 1), 



E7i_ooSi(*)>e. 



(2003). 



£ sup S i (z)=0(T- 1 ). 



26(0,1) 



EQj,n;T - Qj,n 



[RT\ 



m=- log 2 T 



£ m TV(5 m )+0(2^2|^ T |- 1 ) 



(3.7) 
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for all j = — 1, Jt with Jt = 0(log 2 T) and where Kq is a constant 
independent ofj,T and \1Z\. Moreover, under Assumptions 1-4, the variance 
a j Tl-T = V ar Qj,TZ;T is such that 

<^ 2 7?.T< [C 2 + 



fRT\- U ^ T -\r \Tl\J \KT\ 

for all T, for all j = — 1,...,— Jt with Jt = or(log 2 T) and where c 2 = 
c ^llioo with Ki a constant that depends only on the wavelet ip. 

The proof of this proposition is in Appendix B.3. Note that the squared 
bias and the variance of the estimator have the same rate of convergence. 
This phenomenon is due to the nonstationary behavior of the process. In- 
deed, for a stationary process, the total variation of S m is zero at all scales 
and the rate of the bias is then T _1 . This is not the case for a general non- 
stationary process: when the wavelet spectrum is not constant over time, 
an additional term resulting from nonstationarity considerably reduces this 
rate of convergence. Moreover, even if we are dealing with a local estima- 
tor of the wavelet spectrum at a fixed scale j < and a fixed time interval 
1Z, the nonstationarity term in the bias involves the variation of the global 
wavelet spectrum. This may be observed in equation (3.7), which involves 
a sum over all scales m = — 1, . . . , — log 2 T and the total variation of all S m 
over the whole rescaled time interval (0, 1). 

This slow rate of convergence of the bias poses a problem for the estab- 
lishment the asymptotic normality of Qj : ii-T- 111 the next proposition, we 
circumvent this problem and derive a nonasymptotic exponential bound for 
the deviation of Qj t n-,T- 

Proposition 3. Assume that Assumptions 1-4 hold. If a ]nT = ^ av Qj,Ti;T, 
then for all r] > and all scales j = — 1, . . . , — Jt, where Jt = 0(log 2 T), 



Pr(\Qj,n-,T ~ Qj,n\ > 2o-j,K,Tri) 

2r)Lj 

■ 

16 



< c exp<! ■ rj 2 



+ 



\TZT\aj : Ti,T 
2^ 2 77(^2||cx||i,cx> + #3) 



\TZ\VTaj t Tz,T 



with the positive constants cq = 3 + e, Ki as in Proposition 2 and K3 de- 
pending on the wavelet ip and the constants p,C given in Definition 1. 

The proof of this proposition appears in Appendix B.4. This proposition 
gives a nonasymptotic approximation for the deviation of Qj y -jz-,T- This re- 
sult is exploited in the next section in order to choose the interval 1Z in an 
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adaptive way. Prom an asymptotic viewpoint, that is, as T — > oo, we note 
that this exponential bound does not tend to zero, meaning that the stan- 
dardized statistic Qj t -R,T is asymptotically nondegenerate. This phenomenon 
is well known in the context of pointwise estimation; see Lepski (1990) and 
Brown and Low (1996). In order to have a consistent result when T — > oo, 
it is then necessary to require that r\ = r\T grows with T. The appropriate 
rate for r/T is derived in the next corollary. The proof is given in Appendix 
B.4 and is essentially based on the bounds derived in Proposition 2. 

Corollary 1. Under the assumptions of Propositions 2 and 3, if kx 
tends to infinity and is such that Jt ■ exp(— kx) = 0t(1) ; then there exists a 
To > 1 such that for all T > Tq , 

Prf sup \Q jtn . T - Qj.n\ > k T J (1 + c 2 /\K\)/\KT\) = o T (l), 

\-J T <j<0 / 
where c 2 is as in the assertion of Proposition 2. 

Remark 2. An example of admissible rates is Jt ~ log 2 T and kx ~ 
log 2 T. The sequence kx will play a crucial role in Section 4. 

Remark 3. The results are proved under the assumption that the incre- 
ments considered in the definition of LSW processes are Gaussian (Assump- 
tion 3). This assumption allows substantial simplifications in the proofs. 
For practical applications, we believe that this assumption is not unrealistic 
and the class of Gaussian LSW processes is rich enough, as can be ob- 
served from the wide range of applications that are treated in Nason et al. 
(2000), Fryzlewicz et al. (2003), Oh et al. (2003), Woyte et al. (2007) and 
Van Bellegem and von Sachs (2004), for instance. However, it still seems in- 
teresting to see how the above results can be extended to the non-Gaussian 
case. A careful reading of the proof of Proposition 3 shows that the cru- 
cial point is to establish an exponential inequality for quadratic forms of 
the increments. In our proof of Proposition 3, we use the inequality es- 
tablished by Dahlhaus and Polonik (2002) on the quadratic form of Gaus- 
sian random variables. Other exponential inequalities have been established 
for non-Gaussian random variables; see, for instance, Dahlhaus (1988) or 
Spokoiny (2001, 2002). Another example of an exponential inequality for 
dependant data is derived in van de Geer (2002). 

3.3. Estimation of the variance. The main drawback of Proposition 3 is 
that the deviation result depends on the variance er 2 n T = Var Qj t n-,T which 
is typically unknown. The goal of the following derivation is to propose a 
preliminary estimator d^j^ T of cr| ^ T such that Proposition 3 can still be 

used with cr 2 -^ T . 
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The variance <7j^ T depends on the unknown autocovariance function of 
the LSW process in the following way [see Lemma B.l with equation (B.9)]: 

2 11/ 1 1 2 2^ 

where Y<t is the T xT (non-Toeplitz) covariance matrix of the LSW process 
(Xq^t, ■ ■ ■ , Xt-i,t)' , and Ujji-T is the T x T matrix with entry (s,t) equal 
to 

-l 

u$ = \kt\- x Yl A jt E i>Mii>tk(t). 

e=-\ og2 T k£KT 

We also denote by a S)S + u the entry (s, s + u) of the matrix Ey. 
We will estimate Oj t n,T by 

~ 2 ii/ ~ 1 1 2 

a j,n,T = 2\\Uj ! i Z . T Y lT \\ 2 + TjjTjTn 

where Ey is an estimate of the covariance matrix Ey. A first idea is to define 
the elements cr S)S+u of Et by plugging Qj,ii\T into the local autocovariance 
function (2.6), that is, 

-l 

j=- log 2 T 

where TZ(s) denotes an interval which contains the time point s/T. However, 
the convergence in probability of cr s ,s+u to (J s ,s+u is not faster than the rate 
of cr StS + u itself and we need to modify the estimator in the following two 
ways. 

(i) Assumption 1 indicates that the covariance |<t SiS + u | is small for large 
\u\. We set <J s ,s+u to zero when \u\ > Mt for an appropriate sequence 
Mt tending to infinity with T. 

(ii) It is necessary to control the distance in rescaled time between the spec- 
trum Sj(z), for z S TZ(s), and Sj(s/T). To do so, we allow the window 
IZ(s) to depend on T, which is denoted by TZt(s), in such a way that its 
length \TZt\ shrinks to zero when T tends to infinity. This is analogous 
to the estimation of a regression function by kernel smoothing, where 
the window usually depends on the length of the data set. 

With these two ingredients, we propose to estimate a s , s +u by 

-l 

( 3 - 8 ) °s,s+u = Qj,n T (s);T^j(u)l\ u \<M T 

j=-log 2 T 

and the following assumption makes precise the appropriate rates for the 
sequences \TZt\ and Mt- 
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Assumption 5. The sequence Jt is such that Jt = or(log 2 T). The 
length of TZt tends to zero such that 2 Jt \TZt\ = &r(l)- The sequence kx 
(which appears in Corollary 1) tends to infinity such that Jt exp(— hr y\Rr\) = 
or(l)- Finally, the sequence My [involved in the preliminary estimator for 
the variance — see (3.8)] tends to infinity such that 

2 jT \n T \~ l T~ 1/2 M T k T \oglT = o T {\). 

Admissible rates for this last assumption are, for example, Jt ~ log 2 log2 T, 
k T ~ log 2 T, \K T \ ~ log2 3 T and M T ~ logg T with a > 0. It is worth men- 
tioning that with this assumption, \TZt\ shrinks to zero in the rescaled time, 
whereas in the observed time, the interval length \T1Zt\ tends to infinity 
This means that our estimate of Sj(s/T) is built using an increasing amount 
of data in the observed time, but, at the same time, with an average around 
Sj(s/T) in the rescaled time on a shrinking segment around s/T. 

The next proposition shows that on the random set where the estimator 
Qj.n T {s)\T i s near Qj,n T {s)i the estimator (3.8) has a good quality. Our proof 
of this proposition may be found in Appendix B.5 and needs the following 
technical assumption, which is a slightly stronger condition than point 2(a) 
of Definition 1, in the sense that we need to control the decay of Sj(z) with 
respect to j and uniformly in z. 

Assumption 6. The local autocovariance function c(z,r) is such that 

oo 

53 sup|ca-(z,u)|I| u | >Mt =o t (2" Jt ). 

u=— oo 2 

This last assumption on the decay of the local autocovariance function 
uniformly in z is very sensible in the context of short-memory stationary 
processes [in that case, c(z, u) does not depend on z]. With the rates specified 
above, a typical condition is to assume \cx(z, u)\ < c • r'"' uniformly in z G 
(0,1) with 0<r< 1. 

Proposition 4. Suppose Assumptions 1-6 hold. There then exists a 
positive number To and a random set A independent of j and such that 
Pt(A) > 1 - o T (l) and 

\Qj,n T (s);T -Qj,n T (s)\ < ^llcxIkoo&rv^lTe/rrr 1 
for all T >Tq. Moreover, on A, 

(3-9) J Trrl R I - a\^ T \ = o P (l) 

holds for all j = —1, . . . , — Jt, where op(l) does not depend on 1Z. 
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Finally, Proposition 4 together with Proposition 3 leads to the following 
result, which will be used to construct the pointwise adaptive estimator in 
Section 4. 



Theorem 1. Suppose Assumptions 1-6 hold. There then exists a 
ot(1) and a positive number Tq such that for all T > Tq, 



Pr{\Qj,K;T - Qj,n\ > Zdj^H) 



1 + 



< c exp<j - — • rj 



2nL j 



+ 



7ZT\o-j t K,T 

V/ 2 7l{K 2 \\cx\\l,oo+K 3 ) 



+ o T (l) 



\lZ\y/To-j t ii,T 

for all j = —1,...,—Jt, where 7/ = 77-y/l — 77 1 and the positive constants 
cq,K2,K3 are defined as in the assertion of Propositions 2 and 3. 



Remark 4. Theorem 1 gives an approximation of the distribution of the 
normalized loss \Qj.-R;T — Q^vM^^TZ.T- This depends on the unknown quan- 
tities ||cx||i,oo and p [cf. (2.4)]. These two quantities may be understood as 
nuisance parameters of the problem, depending on the global spectrum. The 
estimation of these quantities is based on a preliminary smoothing of Lj ; t(z) 
with respect to z, which we denote by L*. T (z). Here, we think about using a 
kernel smoothing procedure, or a wavelet transform shrinkage as studied in 
Nason et al. (2000). A preliminary estimate of ||cx||i,oo is then obtained by 
plugging L*. T (z) into || 

cx || 1,00 1 [cf- (2-6) and (3.6)]. Next, the preliminary 
estimation of p necessitates the estimation of TV(Sj) [cf. (2.3)]. We estimate 

tvc^o by Ei \l J v(«r x ) - %^r in )i + \L*-A^r x ) - wh ere 

the sum is over the local minima and maxima of L*j. T (z), with zf 111 * < z™f < 
z£±i for all i. 

Remark 5. The estimator (3.3) also involves a constant C 2 . In view of 
Proposition 2 on the variance of the estimator, that constant should ideally 
be close to c 2 = 2K$ \\cx\\i,oo- Because ||cx||i,oo is unknown, it is estimated 
in practice by J2 S sup n & SjS+u . 



4. Pointwise adaptive estimation. The question of how to choose the 
best segment 1Z in the estimator (3.3) arises and the goal of this section is 
to provide a data-driven procedure to select 1Z automatically. 

The proposed method goes back to the pointwise adaptive estimation 
theory of Lepski (1990); see also Lepski and Spokoiny (1997) and Spokoiny 
(1998). Suppose that the wavelet spectrum Sj(zo) is well approximated by 
the averaged spectrum Qm for a given interval U containing the reference 
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point zq. The idea of the procedure is to consider a second interval TZ con- 
taining U and to test whether Qjji differs significantly from Qjjj. As we 
describe below, this test procedure is based on Proposition 3 or Theorem 1. 
If there exists a subset U of TZ such that \Qj,n — Qj,u\ is significantly differ- 
ent from zero, then we reject the hypothesis of homogeneity of the wavelet 
spectrum Sj(z) on z G TZ. Finally, the adaptive estimator corresponds to the 
largest interval TZ such that the hypothesis of homogeneity is not rejected. 

This section contains a formal description of this algorithm and derives 
some properties of the estimator. 

4.1. Testing homogeneity. Let TZ be an interval containing zq, IA a subset 
of TZ and define 



and where the sequence JtT is such that Jt ■ exp(—kx) = oriX) ( see Corol- 
lary 1). Under the assumption that the wavelet spectrum Sj is homogeneous 
on the segment TZ, the difference Aj(TZ,U) is negligible. Then, as a test rule, 
we reject the homogeneity hypothesis on TZ if there exists a subset IA C TZ 
such that \Qj,K;T - Qj,u-,r\ > ^vi^j^T + ^j,u,T)kr for a given 77. 

In the case where the variances <7j,7j,t and ctj^ t are unknown, they may 
be estimated as in Section 3.3 above. 

In practice, we choose a set A of interval candidates TZ. Then, for each 
candidate TZ, we apply the homogeneity test with respect to a given set p(TZ) 
of subintervals U of TZ. 

Assumption 7. In the estimation procedure described below, we as- 
sume the following properties on the test sets A and p(TZ): 

1. for all TZ, the shortest interval of p(TZ) is of length at least 5 > 0; 



2. the cardinality of p(TZ) is such that ${p{TZ)) < ^T^^^^ 1 '^^ 
for some < a < 1 ; 




with 
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3. when we test the homogeneity of the wavelet spectrum on 7Z, we assume 
that there exists a subinterval U G piJZ) such that IA C 1Z and U contains 

Remark 6 (Test sets). We give an example of sets A and p(JZ). For 
each scale j < 0, the corrected wavelet spectrum (3.2) is evaluated on a grid 
k/T, r = 0, . . . , T — 1 in time. We can then choose the set A as 

A = {[r /T,r 1 /T]:r <[z T}<r 1 } 

for ro,7T G {0,T — 1}. Nevertheless, in order to reduce the computational 
effort, we shrink the cardinality of A following the method of Spokoiny 
(1998). More precisely, we first select two sets K, m = {r m :r m < [zoT]} and 
KL n = {r n : r n > [zoT]} which both contain less than T points and then set 

A = {[r m /T, r n /T]:r m G /C m ,r n G K n }. 

Then, one possibility for defining p(TZ) is to consider 

p(K) = {[r_/r, r + /T] : r_, r+ G /C m U /C n }. 

We refer to Spokoiny (1998) for the details of this construction. 

4.2. The estimation procedure. The estimation procedure simply starts 
with the smallest interval in A, assuming that the wavelet spectrum is ho- 
mogeneous on this short interval. It then iteratively selects longer intervals 
in A until the homonegeneity assumption is rejected. Finally, the adaptive 
segment 1Z is the longest segment 1Z of A for which the homogeneity test is 
not rejected, 

1Z = argmax{|7£| such that \Qj t n- t T — Qj.u-,T\ < ^{^j,Tl,T + ^HA,T)hr 

(4.2) 

for allWCp(ft)}. 

The adaptive estimator of Sj(zo) is then defined by 

(4.3) Sj(z ) = Qjfi )T . 

In the case where the variances <7j,7e,T and <Tj,w,T are unknown, they may 
be estimated as in Section 3.3 above. In that case, the homogeneity test is 
based on Theorem 1 and the modification of the following results is straight- 
forward. The proofs are longer, however, but the technique in the proof of 
Theorem 1 can be used to transfer the problem with estimated variances to 
the problem with known variances <7j,7e,T and &ju,T- 
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4.3. Properties of the estimator in homogeneous regions. The next result 
quantifies the ^ p -risk (p > 2) when the wavelet spectrum Sj(z) is homoge- 
neous on z E TZ. To define this concept of homogeneity, we introduce the 
bias 

b(1l) := sup \Sj(z) - Qj,n\, 

which measures how well the wavelet spectrum Sj is approximated by Qjji 
on z&TZ. We say that the spectrum is homogeneous (or regular) on TZ if the 
inequality 

(4.4) b(TZ) < CjOj^Thr 
holds with 

(4.5) Cj = 2~ j ' 2 y^+p 

for a positive real constant a. In the inequality (4.4), crjji,T is the square 
root of the variance of the estimator Qj t n-,T of Sj(z), z G TZ. As in Spokoiny 
(1998), (4.4) can be viewed as a balance relation between the bias and the 
variance of this estimate. The term then appears as the correction term 
necessary in the pointwise estimation in order to bound the normalized loss 
[see Lepski (1990), Lepski and Spokoiny (1997)]. In the following results, we 
set Jct proportional to log 2 T. 

Proposition 5. Let TZ be an interval of (0,1) and consider the test 
rule (4-2). If the wavelet spectrum Sj is regular on TZ in the sense of condi- 
tions (44)-(4.5), then, with 2A = 2n = 2~ j / 2 5(2a +p) and /c T ~log 2 T, 

Vy(TZ is rejected) = 0(T- KpV ~ & ) 
for some positive constant K depending only on K^^Kj, and HcHi^. 

We can also evaluate an upper bound for the £ p -risk associated with our 
estimator. 

Theorem 2. Assume that the wavelet spectrum at scale j, Sj(z), is 
homogeneous on the segment TZ in the sense of (4-4)-(4-5) w ^ 

k T ~ log 2 T. 

If Sj(z) is the pointwise estimator of the wavelet spectrum obtained by the 
estimation procedure (4-2)-(4-3) with 

ri = 2- j/2 5{2a+p), 

then there exists Tq such that the pointwise i p -loss is bounded as follows: 

E\Sj(z) - Sj(z)\ p < K5- p T- p / 2 (2il 2 5- 1 + k T ) p 

for p>2 with a positive constant K and T >T$. 

The proof is found in Appendix B.8. 
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4.4. Properties of the estimator in inhomogeneous regions. We now de- 
scribe the behavior of our estimator near a breakpoint located at a time 
point Zi,. 

For a fixed scale j G {—1, • • • , — Jt}, assume the evolutionary wavelet spec- 
trum to be homogeneous on TZq = [zq,z*) and on 1Z\ = (z+,zi]. We write 
1Z = TZq U TZ\ = [zo,z\] and 

Ot ■= E(Qj,n 0! T - Qj,Hv,t) 

and we assume that Ot > 0. The value of Ot > precisely quantifies a change 
in the spectrum between the regions TZq and 1Z±. 

To prove the next proposition, we assume that the estimation procedure 
is such that TZo and 1Z± are in p(TZ). 

Proposition 6. If the evolutionary wavelet spectrum at scale j contains 
a breakpoint at z* (i.e., Ot > 0) and if Ut ~ k^T, then 

Pr(7£ is not rejected) 

= l exp l 5^ ) + exp { Sip 

where c is a positive constant and x Ay = min(x,y). 

The proof of this proposition is given in Appendix B.9. Proposition 6 
concerns the consistency of the test of homogeneity. Moreover, it allows 
a discussion of the local alternative to this test. We first note that the 
alternative hypothesis, that is, the definition of the inhomogeneous region, 
depends on the level of the jump 0t and the lengths of the two segments TZo 
and TZi. As a consequence, in order to study the local alternative, we need 
to investigate both cases 0t — ► and (\1Zq\ A \TZi\) — > 0. It is interesting to 
note that Proposition 6 depends on the product |#r|(|7£o| A an d t ne 

local alternative of the test is then studied when this product tends to as 
T — > oo. From the proof of Proposition 6, it is straightforward to see that if 

IftrldTZolAl^iDVT 

as T — > oo, then the estimation procedure is consistent in the sense that 
Pr(7£ is not rejected) is asymptotically zero. 

5. Simulation. We conclude with a brief simulation study. We consider 
the evolutionary wavelet spectrum plotted in Figure 1 (upper plot). The 
first scale of this spectrum is given by S-\(z) = 1 [0.25,0.575] ( z ) + (sin 2 (27rz — 
vr/4) + 0. 5)l[o.75 i i] (z). The second scale is inactive. The other active scales are 
5_ 3 (z) = (sin(7rz - 7r/4) 2 + 0.5)l[ 0) o.25]Cz) and 5_ 4 (z) = (sin 2 (57rz - 7r/4) + 
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Fig. 2. The bold line in both graphs is the first scale of the evolutionary wavelet spectrum 
considered in Figure 1. The upper figure summarizes the results given from 100 simulations 
of the LSW process. In this figure, each vertical interval represents the 90% interquantile 
range from the 100 results and the bullet is the median. The bottom figure presents the local 
adaptive estimator (bullets) from the realization of the process shown in Figure 1 (lower- 
plot). The continuous line is the estimator of Nason et al. (2000). 

0.5)lro.375 i] (z). We apply the estimation procedure to 100 different time se- 
ries of length 1000 generated from this spectrum with Gaussian increments 
and Haar wavelets. For the sake of brevity, we only consider the estimation 
at the scale j = —1. The results of the 100 simulations are summarized in 
the upper plot of Figure 2. At each point of the 39 points of estimation, the 
vertical segment represents the median and the 90% interquantile interval 
from the 100 estimators. The bottom figure shows the estimator (bullet) 
from the single simulation given in Figure 1. The continuous line gives the 
estimator obtained from the ewspec function of the WaveThresh 3 software 
package [Nason (1998)] using the recommendations suggested in this pack- 
age for the choice of the parameters (other configurations performed quite 
similarly or worse) . This estimator is a smoothing of the corrected wavelet 
periodogram using Tl-wavelet soft thresholding; see Nason et al. (2000) for 
details. Note that this method is limited to dyadic sample sizes. As our 
simulation contains 1000 data, we repeat the last observation 24 times. 
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The mean square error for the local adaptive estimator is lower (0.063) 
than for the nonlinear wavelet estimator (0.074). The mean absolute devi- 
ation is also lower (0.152 compared with 0.189 for the wavelet estimator). 
The lower plot of Figure 2 clearly shows the high variability of the ewspec 
estimator in the last part of the spectrum. We explain this phenomenon by 
the cross-correlation between the corrected wavelet periodograms at scales 
— 1 and —4. It is interesting to note that our method seems to be more sta- 
ble with respect to this phenomenon. This has been observed in comparison 
with ewspec using different wavelet families for smoothing. 

In our simulation, it is worth mentioning that the local adaptive estimator 
is computed using the estimated variance, as explained in Section 3.3. Of 
course, there is a set of global parameters which must be chosen. For the 
example treated in this section, we set Mt = 2 and \1Zt\ = 9 [see (3.8)]. 
With this, we have followed the guidelines given in the companion paper, 
Van Bellegem and von Sachs (2004) (Sections 2.3 and 2.4 therein) on the 
choice of nuisance parameters for the quadratic part of the estimator. In 
particular, two remaining global parameters have been chosen to equal the 
numerical values given for the (different) example of Section 2.5 therein. The 
paper also derives a new test of covariance stationarity and presents some 
applications to medical data analysis. 

APPENDIX A: PROPERTIES OF THE AUTOCORRELATION 

WAVELET SYSTEM 

This section summarizes useful results on the system {^j} and the oper- 
ator A. Recall that we have denoted by Cj the length of | supp?/>jo| for all 
j = -1, -2, . . . , so we have Cj = (2 - * - l)(£_i - 1) + 1 < 2 _ J£_ 1 . We also re- 
call the definition of the autocorrelation wavelet system = — 1, — 2, . . .} 
which is the convolution of the nondecimated wavelet system, 

oo 

*i00= E ^fc(0)^ fc (r). 

k=— oo 

It is straightforward to check that ^fj is compactly supported for all j < 
and that the length of its support is bounded by 2Cj — 1. 

The following lemma recalls other useful results on the autocorrelation 
wavelet system. 

Lemma A.l. (a) For all scales j and all r, $j(r) = ^j(-r). 

(b) The autocorrelation wavelet system {^fj',j = — 1, — 2, . . .} is linearly 
independent. 
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(c) The identity 
(A.l) £ 2%(t) = 5 (t) 

j=-oo 

holds for all r £ Z. 

Property (a) is obvious and implies the symmetry of the local autocovari- 
ance function, that is, c(z,t) = c(z, — r), as expected. Property (b) is proved 
as Theorem 1 of Nason et al. (2000) and shows that the local autocovariance 
function is univoquely defined. Finally, property (c) is proved as Lemma 6 of 
Fryzlewicz et al. (2003) and implies, for instance, that the wavelet spectrum 
of a white noise process is proportional to 2 3 for all scales j < 0. 

As the autocorrelation wavelet system is not orthogonal, we introduce the 
Gram matrix A defined by Aj£ = J2 T ^ j{r)^ (_{t) . The following properties 
of A are used thereafter. 

Lemma A. 2. For Haar and Shannon wavelets, there exists a finite pos- 
itive constant v such that the matrix A fulfills the following properties for 
all j = -l,...,-log 2 T: 

-i 

(A.2) Y A jl= 2i + 0(2 j/2 T- 1 / 2 ); 

i=-\og 2 T 

(A.3) Yl \Ajl\ <u(l + V2)2^ 2 ; 

£=-log 2 T 

Y 2-^\Ajl\<u.2^log 2 T- 

i=-\og 2 T 

(A.4) 

Y 2-^7/1 < K2 + v / 2)2^ 2 T 1 / 2 . 
i=-log 2 T 

For all compactly supported wavelets, the matrix A fulfills the following prop- 
erty: 

(A.5) A je < {2Cj - 1) A (2C e - 1) A y/CiC m , 

where x Ay = min(x, y) . 

Proof. The following argument shows that the main term in (A.2) is 
2 3 . Using the fact that \I^(0) = 1 for all I < and the identity (A.l), we may 
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write 

-1 -1 oo 

£ Aj} = £ A-} J2 2 m * m (u)Mu) 



=—oo m,u=— oo 

-1 



= 2 m 5o(j-rn) = 2i 

m=—oo 

from the definition of A. Observe that this argument holds for all compactly 
supported wavelets. To compute the remainder of (A. 2), we introduce the 
auxiliary matrix T = D' ■ A ■ D with diagonal matrix D = diag(2^/ 2 )^<o, that 
is, Y ji = l?l 2 A j gl l l 2 . Nason et al. (2000), Theorem 2, have proven that the 
spectral norm of T" 1 is bounded for Haar and Shannon wavelets. We then 
get 

-log a (T)-l -log 2 (T)-l 

£ A~l = 2 J / 2 J2 2^ 2 rT/ = 0(2^ 2 T- 1 / 2 ). 



To prove (A.3), note that ElLi og2 T \ A jl\ = ElL^T 2 ^ 2 ^ 2 ^ ^ 2 ' /2 



x 



(1 + \f2)v, using sup^ jr,^ 1 ) < v. (A. 4) is obtained similarly, using the ap- 
proximation E7i_io g2 T 2_j/2 < ( 2 + V2)VT. (A.5) follows from the defini- 
tion of Aj£ and the support of the autocorrelation wavelets, using |^j(t)| < 1 
uniformly in j and r. □ 

APPENDIX B: PROOFS 

Suppose M is an n x n matrix and M' is the conjugate transpose of M. 
We denote by 



||M|| 2 :=ytr(MM) 

the Euclidean norm of M and by 

1 1 A'jT 1 1 spec := max{\/A : A is an eigenvalue of M*M} 

the spectral norm of M. If M is symmetric and nonnegative definite, then 
by standard theory, we have ||M|| sp ec = sup{||Mx||2 :x S C n , \\x\\2 = 1}- We 
will also use the following standard relations which hold for all symmetric 
matrices B,C: 

(B.l) ||-B|| sp eo < \\B\\2', 

(B.2) 1 1 1 1 spec = max{A : A is an eigenvalue of B}; 

(B.3) 1 1 -SC j I spec ^ II B ||spcc II C|| S pec! 

(B.4) \\BC\\ 2 <\\B\\ spec \\C\\ 2 <\\B\\ 2 \\C\\ 2 . 

In the sequel, we use the convention wa^-t = for k < and k>T, which 
leads to helpful simplifications in the following proofs. 
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B.l. Proof of Proposition 1. On one hand, due to Definition 1 and equa- 
tion (2.2), we have 

— 1 oo 

CX,T{z,T) = Cov(X[ zT]tT ,X[ zT]+TtT ) = E \ W j,k+[zT];T\ 2 ' l l ; jk( )'>Pjk{T) 

j=—co k=—oo 

= E E S;(^^WohMr) +Rest T (z,r), 

j = — oo k— — oo 

where the remainder is such that 

— 1 oo 

\Rest T (z,T)\=0(T- 1 ) Y, E C7#jfc(0)^-fc(r)|, 

j=—co k=—oo 

by assumption (2.2). On the other hand, we have cx(z,t) = 
Eji-oo S"=-oo 5j (z) V# (O)Vifc (r) . Then, 

oo i 

E / ^|cx,r(^,r) -c x (z,t)\ 

< E E E 

r=— oo ^ j=— oo k=— oo 
oo „! 

+ E / ^|Rest T (2,r)|. 



T = — OO 



With appropriate changes of variable, this bound may be written as 



-1 oo T-l 



E E E E / ^ 

t=— ooj=— ook=— oo i=0 



1/T 



5 i( ^ ) ■ T 



|^-fe(0)^ fe (r)| 



oo „ x 

+ V / dz\Rest T (z,T)\, 



_ '0 

which is bounded by 

oo —1 oo oo „i 

T ~ l E E E l*|TV(5i)|^fc(0)^- fc (r)|+ E / ^|Rest T (z,r)|, 



T= — oo J= — oo k = — oo 

where we have used the following property of the total variation: 

T-l 

< \a-(3\TV(Sj) for all a, /3 EN. 



E 

t=0 

(B.5) 
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As the support of tpjk(0) is of length Cj, we get \k\ < Cj in the first term. 
Together with condition (2.3) of Definition 1, this finally leads to 

oo -i 

E / dz\c x ,T(z,T)-c x (z,T)\ 

r=-oo J 

— 1 oo 

KOiT- 1 ) (Cj + CjLj) £ |^ fc (0)^ fc (r)|. 

j = — oo r,fc= — oo 

The compact support of ipjk limits the sums over k and r as follows: 

oo C-j ~ 1 oo 

(B.6) \^jk(^ jk (r)\= E IV'ifc(0)^(r)|<2£ J -l, 

T,k=— oo t=— fc=— oo 

by the Cauchy-Schwarz inequality for the sum over k. We then get the result 
by (2.4). 

B.2. Preliminary results. Define X_ T = (A^t, ■ ■ ■ , Xt-i,t)' • By defini- 
tion, Qjjz-T can be decomposed into the sum of a quadratic and a linear 
form, 

( B - 7 ) Qj,K;T = Qj > n;T + <lj,1l;T, 

where 

(B.8) Q°j,K;T = X!TUj,K;TX T 

is a quadratic form with the T xT matrix Ujn-T whose entry (s,t) is 

-i 

U st = \KT\- 1 Y, A Ji E 1>tk(s)iJ>ik(t) 

£=- log 2 T k&TlT 

and Qj-ji-x = \7ZT\~ 1 J2keTlT z j,k;T is the linear form. For notational conve- 
nience, we omit the dependence of U s t on j and 1Z. Assuming that the 
orthonormal increment processes {£jk} in Definition 1 are Gaussian, X T 
is a multivariate Gaussian random variable with covariance matrix Y,? = 
Cov(X T X' T ). Therefore, we can write 

Qj,K;T = Z^T M j,Tl;TZ_T + Qj,n;Ti 

where Z_ T = {Z\, . . . , Zt)' is a vector of i.i.d. Gaussian random variables with 
zero mean and Var Z\ = 1 , and 

(B.9) M jt K;T = 4 /2 ^;t4 /2 

is the matrix of the quadratic form. 

In our proofs, we use the following lemma quoted from Neumann (1996). 
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Lemma B.l. Let Z_ n = (Z\, . . . , Z n )' be a vector of i.i.d. Gaussian ran- 
dom variables with zero mean and Var Z\ = \. If M n is annxn real matrix, 
then 

E(Z! n M n Z n ) = trM n , 
\sx{Z! n M n Z n ) = 2tvM' n M n = 2\\M n \\l 
and, for all r>2, if Cum r denotes the rth cumulant, we have 

| Cumr(Z! n M n Z n )\ < 2 r ~\r - l)!||Mj 2 {A max (M n )} r - 2 . 

The following lemmas derive some bounds for the Euclidean and spectral 
norms of Uj^T an d Sy. 

Lemma B.2. With fixed TZ C (0, 1), there exists a Tq such that, uniformly 
in T > T , 

\\u j , n . T \\l<Ki2 : >\n\- 2 T- 1 

for all j = — 1, . . . , Jt = ot (log 2 T), where isT 2 depends only on the mother 
wavelet ip. 

PROOF. If we let TZ = (ri,r 2 ) C (0, 1), then we can write U st = uj%' - 
U§\ where := \KT\~ 1 £*Aj/ EtJ" 1 ^aWM is the element (s,t) 

of a matrix U^. T and U$ := \TZT\~ 1 J2e A ]l YX^q ^fe(*)^fe( s ) is the ele- 

(2) 

ment (s, t) of a matrix U^. T . Note that the compact support of the wavelet 

ij) implies that U^' = when s or t > [r±T] and, similarly, U£t = when 

s or t > [r 2 T]. We also introduce the matrix U*^. T whose entry (s,t) is 

U*^ := \KT\- 1J £,zAjlV e (s - t)I < S;t<[riT] and similarly define U*$. T . We 
now have the decomposition 

1177 „ ^ii 2 < 9iir/ (1) 77 (1) * \\ 2 ^ dwn^ 77 (2) * II 2 
+ A\\U (1> -U (2h II 2 

^^\\ U j,K;T U j,TZ;T\\2- 

From the definition of the autocorrelation wavelet ^f, the first term is 

1177(1) U W* II 2 

\\ u j,Tl;T u j,K;T\\2 

-1 [riT]-l 00 

= \KT\- 2 A jl A Jm E E ^k{t)i>i k {s)i> mn (t)i> mn (s). 

l,m=-\og 2 T s,t=0 k,n=[nT} 

The compact support of ipek(s) implies that s> k — £g> ([riT] — £g) V 0. 
Using the same argument on ijj mn (t), we have t > ([riT] — C m ) V 0. Using 
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the Cauchy-Schwarz inequality twice for the sums over k and n, we get the 
bound 

\\^% T -U^ T \\l<\KT\->( £ C t \Aj}\) 3 

\e=-iog 2 T ) 

< \KT\- 2 v 2 (2 + y/2f&TC 2 _ x , 

by (A. 4). The second term is bounded similarly. The third term is bounded 

by 2||£7y£* T ||2 + 2||£7-^* T ||2 and each term of this last sum can be bounded 
as 

11/7(1)* ||2 
\\ U j,-R.;T\\2 

T-l 00 

<\nrr 2 J2 E H A u A l^^-t)^ m {s-t) 

s=0 t=—oo £,m 

= T\KT\- 2 A- 1 , 
which leads to the result. □ 

Finally, the proof of the following lemma is similar to the proof of Lemma 
5.9 in Dahlhaus and Polonik (2006). 

Lemma B.3. Under assumption (3.5), ||St||spcc = \\^t 2 |lspec — ||cx||i,oo < 
00. 

B.3. Proof of Proposition 2. 

Expectation. In decomposition (B.7), we first note that Eq^n-x = 0. 
Next, a straightforward expansion leads to 

-1 T-l 

k&lTl=-\og 2 T s,t=0 
— 1 00 

X E E w mn;T^mn(s)lp m n(t) 
m=— oo n=— 00 

= |wrr 1 E E A jl 

kelZT i=-\og 2 T 
-1 00 /T-l \ 2 

X E E W lin;T[ E ^(s)V'm.n(s) I • 
m=— 00 n=— oo \s=0 I 
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Defining u := n — k, we can write 

— 1 oo 

EQj,^;T = \7ZT\ 1 X! X] E W m,u+k,T 
k^TZT m=— oo u=— oo 

-1 / oo \ 2 

x E ^7m X i>£k(s)ipm,u+k{s) I • 
£=-log 2 T \s=-oo / 

By Definition 1, we can write u+k T = S m (k/T) + Rxim, u, k) with 



1*, k)\ < 

which leads to 



Sn 



u + k 



[- 



CC n 



EQIk;T=\KT\- 1 E E Sm(t) 

7- f— ' I f r 3 n fin — n/> v / 



fce7£T m=-oo 
— 1 oo / oo \ 2 

x E ^7/ E ( E ^zk{s)ipm, u +k{s) I + Rest T . 

£=— log 2 T m=— 00 \s=~oo / 

By construction of the matrix A, we observe that 

00/00 \ 2 

(B.10) A lm = E I E V ; ^(s)V'm,«+fc(s) j , 

u=— 00 \s=— 00 / 

which implies, by Assumption 4, that 

k 



VQ° j ,Ti;T = \KT\- 1 E Sj U +Rest T 



(B.ll) 



keKT 



\TZ\~ 1 f dzS j (z) + 0(\TZT\~ 1 L j ) + Rest T , 



where the last equality is a standard result on the total variation [see, e.g., 
Lemma P5.1 of Brillinger (1975)]. 

We now bound | Restr |- As s goes from —00 to 00, we have 

-1 -1 
|Rest T |< E E \ A Jt\ 



00 

x E m- 1 E 

u=-oo keTZT 
/ 00 \ 

x E ^eo(s)ip mu (s) 



u + k 



Sm. „ 



+ 



T 
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Using (B.5) for the sum over k, | Restr| is bounded by 

m=— oou=- oo ^11 ' £=— log 2 T \s=— oo / 

In this last expression, the compact support of tpiQ and ijj mu implies that 
\u\ < Ce V C m , where x V y = max(x,y). Together with (B.10), we get 

-l -l 

| RestT | < \RT\~ 1 E E {TV(5 m )(^V£ m ) + CC m }|Aj/|^ m , 

m=— oo log 2 T 

which, with (A. 5), leads to 
| Restr I 

< |^rr 1 ^{TV(5 m )£,(2£ m - 1) 

(B.12) +TV(S m )C m (2C e - 1) + CC m (2£ m - l)}\Ajl\ 

= 2(2 + ^2)v2? l2 \TlTy x \fTC- 1 
-i 

x (2£ m -l)TV(5 m ) + 0(2^2|^r|- 1 ), 

m=— oo 

using (A. 4) and (2.4). 

Variance. Using decomposition (B.7), the variance is decomposed as 
VmQj,K;T = VarQ° )7e . T + Var<?°^. T , where Va,Tq° n . T = C 2 2V|^T|. Using 

Lemma B.l with (B.4), we can write Var Q° = 2\\Mj t n ] T\\2 < 2||E^/ 2 ||g pec x 
||C^,7^;t||2 anci t ne result follows from Lemmas B.2 and B.3. 

B.4. Proof of Proposition 3 and its consequences. Our proof of Propo- 
sition 3 requires the use of an exponential bound for linear and quadratic 
forms of Gaussian random variables. For the sake of presentation, we here 
summarize the results we use. 

Proposition B.l. Let Z be a Gaussian random variable with mean 
zero and unit variance. Then, for all A > 0, 

Pr(|z|>A) 4 A I75>" /2 ' 

where a A b = min(a, b) . 
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Let Z_ n = (Zi, . . . , Z n )' be a vector of i.i.d. Gaussian random variables with 
zero mean and Var Z\ = \. If M n is annxn matrix such that ||M n || S p ec < T ro 
and ct 2 = 2||M n |||, then, for all A > 0, 

Pv((Z! n M n Z n - tr M n ) > a n X) < 2exp 

Moreover, ifY is a Gaussian random variable with mean zero and vari- 
ance a 2 <o~n> then 

Pv((Z! n M n Z n + Y - tr M n ) > a n \) < 3exp 

Proof. We prove the first inequality. On the one hand, by Chebyshev's 
inequality, 

Pr(Z > A) < inf exp{-tX + logE(e* z )}, 

where E(e tz ) = e _ * 2/2 . The minimum is attained for t = A and we get Pr(|Z| > 
A) < e~ A2 / 2 . On the other hand, a straightforward calculation leads to 

Pr(Z > A) = r -^e^ 2 dt < r ^e"* 2 / 2 dt = ^=e~* I 2 
Jx V2tt Jx V2tt \V2tt 

and the result follows. The second inequality follows the proof of Proposition 
A.l in Dahlhaus and Polonik (2006). The last inequality is derived from the 
two former inequalities. □ 

As in the proof of Proposition 2, equation (B.9), we write Qj,n-,T as a 
quadratic form of Gaussian variables in order to apply Proposition B.l with 

/1/2 1/2 

Mj t U\T = S T Uj^Tz-T^T an d thereby prove the assertion. 

Proof of Proposition 3. We use the last exponential inequality of 
Proposition B.l because Qj,n-,T can be decomposed [see (B.7)] into + 
Qj,K;T> where Qj,n-,T = ^T M j,n-,TZ T and q° jn . T ~ Af(0,C 2 2i/\KT\).' Note 
that Lemmas B.2 and B.3 imply, with (B.l) and (B.3), that 

(B.13) ||M^ ;T || spcc < 2^ 2 || Cx || lj00 |^|- 1 T- 1 / 2 . 

Therefore, Proposition B.l leads to 

Pr((Qj,K;T - Qj,n) > Wj,K,T) 

< Pr((Q j;K;T -EQj,K;T) > Wj,TZ,T/2) 



4 l + 2(Ar 00 /a^ 



f-i x - V 

^ 4 l + 2(XT 00 /a n ) 
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< 3exp^-^ 
+ exp ( 1 



16 l + V {^/ 2 K 2 \\c x \\ 1>00 )/{\n\T^a jtnt T) 



2 1 EQj,K]T - Qj,n\ J 

To bound the second probability, we observe that (B.ll) and (B.12) lead to 
\EQj,K;T-Qj,n\ < |^T|- 1 (L i + Er 3 2^/ 2 )- 1 v / T) with K 3 = Av{2 + V2)(2p- 
1)(C V l)£_i. This implies 



Pr((Qi,w;T - Qj,n) > Wj,n,T) 



< 3 exp 
+ exp ( 1 



1 

16 ' a j! T Z ,T + v(^ /2 K 2 \\cx\\i,ooVf)/\TZT\ 
1 7 l 2(7 j,n,T 



2t? (Lj + K32O'/ 2 )- 1 VT)/\KT\ 



and the result follows. □ 



Proof of Corollary 1. In the following proof, K denotes a generic 
constant and krp is an increasing function of T. By Proposition 2, := 
VaxQj,ll;T < (C 2 + (? I\R,\)2? j\RT\ uniformly in j, which implies 

Prf sup \Q jt K;T - Q h n\ > k T J(C 2 + c 2 /\K\)/\11T\) 

\-Jt<3<0 / 
-1 

< ^(\Q^T-Qj,Tz\>^ j/2 k T a jt n,T)- 
3=-Jt 

Using Proposition 3, this probability is bounded by 

1 



cqJ t max exp<{ -— • (2 3 k\j2) 
-J T <j<o l lb 1 



2k T 2- j / 2 Lj 

1 + 



(\KT\a h1 i,T) 
k T VT 



(K 2 1| c x || 1,00+^3) 



Proposition 2 shows that, for T sufficiently large, (Jj.u.T > y 2- ? /|7£X'|. This 
leads to the bound 



c J T max expj--^ • (fc^/2) 
-Jt<j<o I lb ' 



PI 
2^ 2 k T 



(K 2 \\c x \\i,oo + K 3 
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By assumption (2.4), there exists a positive constant p' such that Lj < 2^ 2 p' . 
Then, asymptotically, the rate of convergence of the dominant terms in this 
exponential expression are given by Jt • exp(— kx), which is o^(l) by the 
assumption on kx- D 

B.5. Proof of Proposition 4. 

Lemma B.4. IfU$ = \TZT\- 1 EZt-i og2 T A jl EkenT^ik{s)^ ek (t), then 
V V \n (j) n (j) \^ , <v+ 1 r tl 2 TN T lo & T 

\ U ts U tu \*\s-u\<N T - 1 L -^ V \jzr\2 

t=—oo s,u=—oo ' ' 

Q / JV T l0glT 



T 



Proof. Direct calculation yields 

i\<N T 



E E \ui?u£\\-u\ : 



t=—oo s,u=— oo 

— 1 oo 

<\TZT\~ 2 E I^II^JrJ E l \s-u\<N T 
£,m=-log 2 T s,u=— oo 

X E E \^ik(s)^£k(t)\) i E \^mn{u)i) m n(t)\ J. 

Using the Cauchy-Schwarz inequality for the sum over t, we get a product 
of two terms similar to (J2t(J2k V , £fc( s )V'£fc(*)) 2 ) 1 ^ 2 — V2£e — 1- Then, 

oo oo 

E E l^M } lv«i<^ 

i=— OO s,u=— oo 

< TiV T |^Tr 2 E l^/ll^lv^ - lV2£ m - 1 

and we obtain the result by (A. 4). □ 

In the proof of Proposition 4, we need a modification of Corollary 1, in 
which 1Z is replaced by TZt- The proof of the following result is along the 
lines of the proof of Corollary 1. 

Lemma B.5. Under the assumptions of Propositions 2 and 3, there ex- 
ists Tq > 1 such that, for all T > Tq , 



Pr L,s<o IQ ^ T(s);T_Q ^ T(s)l - w\i =or(1) ' 
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provided that J? • exp(— &tvk^H) = °t(1)- 

Proof of Proposition 4. Define a s , s+u := T.jl-\ og2T Qi>,n T (s)^i{u) x 
I|u|<m T i the entries of a matrix S, and define v^ht := 2||fj ^-t^tII! + 
C 2 2i /\RT\ . Our proof is based on the decomposition 

p;- 2 ~ 2 —I';;- 2 ^ 2 ^ _l ^ 2 ^ 2 ^\ 
^VK.t ~~ a j,n,T — \ a j,n,T ~ a j,n,T) + \ a j,n,T ~ a j,n,T)i 

where the first term is stochastic while the second term is deterministic. 

We will first show that the deterministic term \a"j T — cr?-^ T \ is o(2^ Jt T^ 1 ). 
Using (B.4), we can write 

l/=.2 2 \ 

2\ a j,K,T ~ a j,TZ,T) 

= II UjfaT^T || 2 ~~ II UjfcT^T || 2 

— II^^t^t — ||| + 2 • ||[/' n -T^rh ■ \Wj -ji-t^t — St)||2 



< II^'^tIII " II^t — £t" 2 



spec 



+ 2 • ||^j,^;T||2 " ||St||spcc - \\^T ~ Sr||speo 

where we know, by Lemmas B.2 and B.3, that HC^/r^tH! = 0(2 J T~ 1 ) and 
HStIIspgc < ||cx||i,oo- Moreover, we can write 



| S7 1 — 



spec 
00 



(B.14) 



u=— 00 



s+u) 

s 



where 



= E SU P E J2 (w£n;T-Qe,Tl T (s)) 
u=—oo s £=—oon=—oo 

x ip en (s)ipe n (s + u) + Ri + R 2 , 

00 —1 

Ri = sup ^ Q^ T ( s )^( n ) I i«l>^T' 

u=— 00 s £=—00 

00 -Iog 2 (T)-l 

R 2= H SU P H Q^t(s)*^('") I |u|<-^t- 
u=— 00 s £=—00 



As 

00 



V sup V Qe,ii T ( s )^e(u)= V sup|^t| 1 / ^c x ( 



£=-00 
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the rate of Ri is ot(2 Jt ), by Assumption 6. Next, using < 1 uni- 

formly in I < 0, we get 

oo -log 2 (T)-l 

IR2I < E supl^rT 1 / dz V Si(z)I\ u \ <Mt 

u=-oc s Jn T(s) ^ = _ QO 

-log 2 (T)-l 

<2M T £ sup S e (z) = 0(M T /T), 

l=-oo Z 

using Assumption 4. Assumption 5 on the rate of the truncating sequence 
M T implies |R 2 | =o t (2" Jt ). The main term of (B.14) is bounded by 

00 —i 00 . 

E sup E E \ U t\~ 1 dz\wj T -S e (z)\ 



-co n=— 00 



(B.15) 

By Definition 1, we can write 

\wfn-T ~ Si{z)\ < 



X \lp£n(s)lpl n (s + u)\. 



CCp 



+ 



St 



+ 



Se(z) - Si 



-s £ 

n — s 



n — s 



T 



+ z 



+ z 



which, when substituted into (B.15), leads to three terms. By (B.6) and (2.4), 
the first term is 0(T _1 ). For the second term, with a change of variable z 
to z + s/T, we get 

-l 00 



E -p E E i^r 1 / 

._ „ s „ „ JT\ 



dz 



-co n=— 00 



ft 



7? 



where 1Zt(0) denotes the interval TZt(s) shifted by — s. If we use the fact 
that |^>£n(s)| is uniformly bounded and J2^=-oo \^tn{s + u)\ =0(Ce), the 
second term is then bounded (up to a multiplicative constant) by 



— 1 . OO , s 

i^r 1 e a / <** £ S AV] 



sA-)-s e - + z 



11 



<\TZt\^ V £ £ / dz|z|TV(ft 



m T (o) 

0(\K T \) J2 £eL e = 0(\K T \), 



e=-oo 
-1 



£=-00 
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by assumptions (2.3) and (2.4). The third term is 

oo — 1 oo 



E su p E E m~ x 

i=— oo s £=—oon=—oc 



dz 



n — s 



+ z 



X |^ n (s)^n(s + «)|- 

If sq denotes the infimum of TZt(s), we decompose the integral as follows: 



— 1 oo 



\KtT\-1 So+ ( fe+1)/T 



fc=0 Js o+ k / T 



dz 



S e (z) - S e 



n 



+ z 



u=— oo £=— oo ra=— oo 

X |^n(s)^n(s + «)|, 

which can be rewritten, with the change of variable y := z — sq — k/T, as 

oo — 1 oo 

E su p E E m- 1 



i=—oo n=— oo 
\K T T\-1 y T 



x E 

fc=0 







dy 



k\ _ / ra - s + /c 
^( 2/ + S0 + - J -5Wy + s H 



X |^ n (s)^ n (s + «)|- 
Assumption (2.3) for the sum over k with (B.5) leads to the bound 

oo —1 oo 

E SU P E L « E I^T^rV - s\ \lp£n(s)lpen{s + «)|- 
u=—oo s l=— oo n=— oo 

The compact support of ip£ n (s) implies \n — s\ < £,£. Therefore, (B.6), (2.3) 
and (2.4) imply that this last term is O^KtT^ 1 ). Finally, we summarize 
all the rates of convergence for the deterministic term as follows: 

2~ 3 T- (o-] nT -al nT ) 

= 0{T- 1 + \K T \ + \UtT\~ 1 ) + |Ri| + |R 2 | 

= OiT- 1 + \K T \ + \n T T\- 1 ) + o t (2- Jt ) + o t (2- Jt ) 

= ot(2- Jt ), 



by Assumption 5. 

Let us now turn to the stochastic term la? 



j,K,T a j,n,T 



I. Lemma B.5 im- 



plies the existence of a random set A which does not depend on j and such 
thatPr(.A) > l-or(l) and |Q^ t(s); t-Q^ t ( s )I < (k T /\K T \W{C 2 + c 2 )/T 
almost surely on A, for all T >Tq and j = — 1, . . . , — Jt- We can write 



I _ 2 - 2 

\ a j,1Z,T ~ a j,n,T\ 



3G 
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(B.16) 



T-l 
h,t=0 



s,u=0 



-1 



E (Qe,n T (s);TQm,n T (u);T 



- Qe,n T (s)Q m ,n T (u))^i(s - h)V m (u - h) 

x \~h\<M T ^\u~h\<M T 

almost surely on A. Using the decomposition 

Qe,Tl T (s);TQm,Tl T (u);T ~ Qe,K T (s)Qm,n T (u) 
= (Qm,Tl T {u);T — Qm,Tl T {u))Qe,Tl T {s) 

+ (Qi,n T (s);T - Qi,n T {s))Qm,n T (u) 

+ {Ql,K T (s);T - Ql,1l T {s))(Qm,1l T (u);T ~ Qm,K T (u))i 

we get three terms in the right-hand side of (B.16). On A., the first of these 
terms is bounded as follows (the other terms are bounded similarly): 



2E 

h,t,s,u 



^ts^tu ^2{Qm,K T (u);T ~ Qm,K T (u)) 



^ m (u -h)J2 Qe,n T (s)^e(s - h) 



\~u\<2M T 



\KtWT 



tu 



h,t,. 



\s-u\<2M T 



<2V1 + C — Z^\ U ts U tu \ l \s-u\<2M T 



\Kt\VT t % 



E SU P 



= 0(2 J M T k T \n T T\- l T- 112 \ogl T) a.s. on A, 

using Assumption 1 and Lemma B.4. The result then follows from Assump- 
tion 5. □ 
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B.6. Proof of Theorem 1. By Proposition 4 and for T large enough, 
there exists a random set A such that 1 — Pr(„4) = ot(1) and (3.9) holds on 
A. Then, if A c denotes the complementary random set of A, we can write 

P*(\Qj,K;T ~ Qj,n\ > lOj^TV) 

= P*(\Qj,K-,T - Qj,n\ > 2a jtnt TV\A) Pr(A) 

+ PT(\Qj,np ~ Qj,n\ > 2d j ^ T r]\A c ){l - Pr(^)). 

The second term of this sum is or(l), by Proposition 4. To bound the first 
term, we observe that Proposition 4 implies &'jTiT — a 'j / RT~ ° n A with 
(fx = ot{^~ Jt T~ 1 ). Together with Proposition 2, this implies 

(B.17) ^>i_^ = 1 _ OT (i)^i 

a j,K,T a j,K,T 

for all j = — 1, . . . , — Jt, as T tends to infinity. We can then write 
Pr(\Qj,ll;T ~ Qj,n\ > 2Pj,K,TV) 



< Pr( \Qj,K;T ~ Qj,n\ > tojKTvJl - ^P— A) 

\ V a j,n,T / 

and Proposition 3 leads to the result with 7^ = ^>t / l(7 j,iZ:T- 

B.7. Proof of Proposition 5. Let IA be a segment of p(TZ). Consider the 
a.s. inequality 

\Qj,n-,T - Qj,uM < \Qj,n-,T - Qj,n\ + \Qj,u,T - Qj,u\ + ^j(lZ,U), 

where Aj(lZ,U) is defined in (4.1). In the regular case, Aj(7Z,U) < b(U) + 
b(Jty — Cj(crj t u,T + &j,n,T)kT- Consequently, in the regular case, 

Pr(TZ is rejected) 

< ^ P v {\Qj,U;T -Qj,K;T\ >2 ( r l°j,U,T + Wj,'R.,T)kT} 

ue P (R) 

< ? T {\Qj.7L\T - Qj,n\ > ('j^j.R.i l- r + 2rja jt n,TkT) 

+ Y Pr (IQi,w ; T - Qj,u\ > -Cja jt u,Tkr + 2i]a jt u ,rkr) ■ 

Proposition 3 implies 
Pr(7£ is rejected) 



< (ttp(^))c exp|-^ ■T)\ j 



1 1 27 1T L 3 



\7tT\(TjTiT 
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, y/ 2 VT (K 2 \\c x \\ hoo +K 3 ) 



1 + 



Cj,n,T\'R>\vT 
2r] T Lj 



+ 



\UT\<jj,u,T 
2^ 2 VT {K 2 \\c x \\ l!00 + K 3 ) 



with T] T := 2i]k T - Cjk T = k T 2~^ 2 (b(2a + p) - y/a + p). 

Proposition 2 leads to cr~^. T < C~ 1 2~i I 2 \J\RT\ and similarly for crJj^ T . 
As 5 < \IL\ < \1Z\ < 1, we consider the dominant terms in the sum and can 
write, for T large enough and with 2~i' 2 Lj < pC—\, 



Pr(7£ is rejected) 

<2c (|ip(^))exp 



16 



Vt / 



1 + 



2rj T pC^] 



+ 



Vt(K 2 \\cx\\i,oo + K 3 ) 



Replacing t]t, using 2a+p> yja + p and kx ~ log 2 T, the asymptotic order 
of this bound is 

^p^^O^-(v / ^i/(^2||cx||i,oo+-fsT3))(a+p/2)-j 

and the result follows for T large enough by Assumption 7(2). 



B.8. Proof of Theorem 2. For reader's convenience, we first state two 
technical lemmas. The first lemma is a consequence of Rosenthal's inequality 
[see, e.g., Hardle, Kerkyacharian, Picard and Tsybakov (1998)]. 

Lemma B.6. Let Y ~ Af(0,a 2 ) with a 2 > 0. Then, E\Y\p < C{p)aP, 
where C (p) is a function of p only. 

Lemma B.7. Let Z_ T = (Z%, . . . , Zt)' be a vector of i.i.d. Gaussian ran- 
dom variables with zero mean and Var Z\ = 1. Lf Mj n-x is the matrix (B.9), 
v is a positive constant and p>2, then there exists To such that 

E(Z! T M j!n . T Z T - tr Mj^T + vk T T- l l 2 ) p 

< C(k, \\c x \\ 1 ^,p)T-P/ 2 (2 1+ ^ 2 \n\~ 1 +vk T r 

for all T >Tq. 
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Proof. First, we write 

E(Z! T Mi k tZlt ~ tr Mj n -T + vk T T~ 1/2 ) p 

(B.18) 

= E ( P ) V&tM^tZt - tr Mj t -ji-T) r v p ~ r kjT r T~ ( p ~ r )/ 2 . 

Due to the relationship between the centered moments of a random variable 
and its cumulants, we can write 

E{Z! T Mj^ T Z T - tvMj^Tf 

r 

= E E c '^ 1 '---'^ m ' m ' 7ri '---' 7rm ' r ) K Pi 1 ■•• k p!' 

m=0 

where the second sum is over pi, . . . ,p m , 7ri, . . . , 7r m in {1, . . . , r} such that 
YaLiPi^i = r i K Pi i s the- p^th cumulant of Zlt^-j^tZlt and C denotes 
a generic constant in this proof. From Lemma B.l, (B.13) and Proposi- 
tion 2, K Pi < 2 Pi x (pi - l)!Kf i ||c x ||?; oo 2- J ' ft / 2 \n\-P*T-P^ 2 and, consequently, 

E(^M 7 ^. T Z T -trM i ^ ;T ) r <C(K,||c x ||i,oo,r) 2 r ( 1 ^^\TZ\~ r T- r / 2 . Using 
this inequality in (B.18) leads to the result. □ 

PROOF of Theorem 2. In this proof, C denotes a generic constant. 
Let 1Z be the interval selected by the estimation procedure. We consider two 
cases, \1Z\ < \1Z\ and \1Z\ > \1Z\, and split the expectation into two parts as 
follows: 

E|^(z )-^(zo)r 

= E|5 i («b)-5 i (^)| p l|^| <w +E|5j(«b)-5 i (^)| p l|^|> w . 

First term (\1Z\ < \7Z\). In the first case, we make use of the inequality \a — 
6|P<2P- 1 |a|P + 2P- 1 |6|P and write 

E\Sj{z ) - Sj{z )\ p l^ <m 

<2 P 1 E\S j (z ) - Q^| P 1|^|<| K | + 2 P 1 E\Q j ^. T - Qjjt\ p \n\<\n\- 

As \1Z\ < \1Z\, the evolutionary wavelet spectrum is homogeneous over 1Z 
and 1Z, and property (4.4) holds for 1Z. Then, using Proposition 2 on the 
variance and the first point of Assumption 7, the first term of the right-hand 
side is bounded as follows: 

2 P 1 E 15^(^0) - Qj,n\ Pl \n\<\n\ 

<2 p ~ 1 E{C j a liiT kT) p 

(B.19) 

< 2 p - l C p k^ p ' 2 (T5 2 )- p ' 2 {l + c 2 ) p l 2 



2 p ~ l {a + V ) p l 2 k p T T- p l 2 b- p (\ + c 2 ) p/ 
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by the definition of Cj [see equation (4.5)] . Now, if we let Gt = Z^ T M- -n. T Z_ T + 
\1ZT\~ 1 J2keiiT z j,k;T — tvM- ^. T , then the second term may be written 

2 P ^ E \Gt + bias T |n^ |<|7e| < 2^ 2 {E(|G r | p l | ^ |<|7 , | ) + |bias T p} 5 
where, using Proposition 2 for T large enough, 
(B.20) |bias T | p < C p 2 jp / 2 (5T)- p / 2 , 

with a constant C p depending only on p. Finally, we now show that E is 
uniformly bounded in T. Using 5 < \1Z\ < \H\, we first note that Propositions 
2 and B.l imply 

(B.21) Pr(|C T | > ±J (C3 + c^) < 3exp(-i A " ) , 

where r M < 2 ( ~ j - 1 ^ 2 c/{5VT) by (B.13). We now truncate the integral E\G T \ P = 
Jq 00 dxPr(\Gx\ p > x) at the point /J^ 2 , which is such that fir = 2- J (C 2 + 
c 2 )/(5 2 T). With the change of variable x = y p fJ^ 2 , this leads to 

E\G T \ P < 4 2 + V 4 2 ^ dy y p ~ x Pr(|G T | > y^ 2 ) 

< Vt 2 + PVt 2 [ d y V P ~ X ex P (~ t: ~ / ) • 

Jl V 2 l + 2yT O0> J\'RT\/7?J 

To compute the integral, we note that 1 < y and evaluate dy exp(— atxy) ■ 
This leads to the bound 

e \g t \ p < 4 2 + *v4 2 (2 + ^°cy^f^f < C p 6~ p T- p / 2 . 

In conclusion, in the first case, we get the bound E \ Sj(zo) — Sj(zo)\ p l^ < ^ < 
C p 5~ p T- p / 2 k p T from (B.19) and (B.20). 

Second term (\1Z\ > \R\). We now consider the second case. Select a 
subinterval U in p(lZ) included in 1Z and containing zq. Then, consider the 
decomposition 

E\Sj(z ) ~ Sj(z )\ p lfe\>\-R,\ 

< E{\Q jM ~ Sj(z )\ + \Q jM -T ~ Qj,u\ + \Qj,n-,T ~ QjM;T\Y- 

As the wavelet spectrum is regular on U C TZ, the term \Qj t u — Sj(zo)\ is 
bounded by CjGjjupkr- On the other hand, using Proposition 2, \Qj t U;T — 
Qj,u\ = \Qj,U;T - trMjupl + Rt with R T = 0{2^/ 2 T~ 1 / 2 ). Moreover, as K 



ESTIMATION OF WAVELET SPECTRA 



41 



is selected by the estimation procedure, it holds that \Qj<%. T — Qj.u-,T\ < 



using \1Z\ > \U\ > 5. Then, Lemmas B.6 and B.7 prove the existence of a 
constant C5 depending on k, u,p, K2 and on ||cx||i,oo) such that, for T >Tq, 

E{\Qj,U:T ~ tr M jiU . T \ +R T + CjCFjpi^kT + 2(r]a j ^ T + ^j,u,T)kT} p 



and the result follows using \K\ > 6. □ 

B.9. Proof of Proposition 6. We first prove the following lemma, stating 
an exponential inequality for quadratic forms of Gaussian random variables. 

Lemma B.8. Let Z_ T = {Z\, . . . , Zt)' be a vector of i.i.d. Gaussian ran- 
dom variables with zero mean and Var Z\ = \. If Mt is a T x T symmetric 
and positive definite matrix, then 



provided that r\ < tr Mt ■ 

Proof. By assumption on the matrix Mt, the decomposition Mt = 
O t At x Ot holds with a diagonal T x T matrix At and an orthonormal 
matrix Ot- If we let Y_ T = 0' T Z_ T , then Y_ T is a vector of i.i.d. Gaussian 
random variables with zero mean and Var Y\ = 1 . We can write Z/_ t MtZ_ t = 
Y! T A T Y T = Yj=i AiYj 2 , with Xi > 0. Moreover, tr M T = tr A T , tr A| =trM T = 
H-MtH! and ||Mr|| spec = max{Ai, . . . , At}- The Chernoff inequality [Ross 
(1998)] on Y_ T leads to 

Pr(Z! T M T Z T < 77) = Pr(y^A T y T < 77) 




< C v b^T^I 2 {^l 2 \U\- x + k T ) p + C p 2 jp / 2 \UT\- p / 2 






and, using the fact that logEexp(ajYj 2 ) = — |log(l — 2a.\) < at + af holds 
for a-i < 0, we get 



P r {Zlt MtZ_t < rj) <exp<^ inf(-i77 + itr A r + t 2 tr A T ) 
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The result follows by taking t = (77 - tr A T ) /(2 tr Kj) . □ 

Lemma B.8 is not directly applicable to the quadratic form Qj t Ti ; T = 
Z^ T Mjn-TZ_T because the matrix Mjji-T is not positive definite in general. 
In the next lemma, we show how this matrix can be approximated by the 
matrix Mj^. T , defined as 

m j,1Z;T — u j,1Z;T^T ' 

where the entry (s,t) of the matrix U*^. T is given by 

< t = 2 7o |^T|- 1 £ 2^ £ (s-t), 

<?=-log 2 T 

with 70 > supj <0 sup^ <0 2~^/ 2 1 Ajl\ > 0. The matrix M*^. T is clearly sym- 
metric. It is also positive definite because U*^. T is positive definite: for all 
sequences x = (xo, . . . , xt-i)' of £ 2 , the quadratic form 

JUjWrx = Tol^r 1 J2 2 ' /2 E ( E Xs4>tk(s)) 

£=-\ og2 T s V k I 

is strictly positive. 

Lemma B.9. Suppose that Assumptions 1-4 hold true. Define 71 such 
that 

-1 

< 71 < 70 inf V 2 f / 2 ^ m£ . 

m<0 * — ' 

<?=-log 2 T 

77ie following properties hold true for T sufficiently large: 
(B.22) 7 i|^r 1 e < tr(M* 7e . T - M ji7e;T ) < 6||cx, T ||i l oo'yb|ftr 1 , 
where e is defined in Assumption 2, 

\\ M j,K;T ~ M j s K;T \ | spec 

(B.23) <l|M^. r -M^ ;r ||2 

< s^no^l^llcxIliooT-MogKr) + ocr- 1 ) 

and, if Z_ T = (Zi, . . . ,Zt)' is a vector of i.i.d. Gaussian random variables 
with zero mean and VarZi = 1, then 

(B.24) Pr(Z T (M^ ;T - M m , T )Z T > At) = O (expj- }) , 
w/iere A T = tr M*^ ;T - tr + tr M,- log 2 1 T. 
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Proof. 1. We prove (B.22). Write tr(M*^ ;T - M^ n . T ) = tr(M*^ ;T ) - 
tr(M ji7 e ;T ), where the second term is E(Z! T M jjn . T Z T ) = Q j ^ + 0(2^/ 2 T~ 1 / 2 ), 
from Lemma B.l and Proposition 2. Moreover, 

tr(M*^. T ) 

= tl CS'rpUj^.rp) 

OO / s —1 

S 



(B.25) =2 70 |^T|- 1 ]T c xA^u\ 2 ' /2 ^W 

OO , s — 1 

(B.26) =2 7o |ftT|- 1 Y C A^ U ) E 2^ £ (u) + Rest T . 

s,u=-oo ^ ' £=-log 2 T 

We now derive a bound for Resty- Define At(s/T,u) := cx,t(s/T,u) — 
cx(s/T,u). We first show that TV(A^(-, u)) is uniformly bounded in u. 
For all I G {1, ... , T} and every sequence < ai < a 2 < • • • < aj < 1, we can 
write 

A T (ai,u) - A T (a,i-i,u) 

-1 oo 



E E {^'(10 ~ 5 'i( a i)}V ; ifc([«^])V'ifc(kr] + u) 
j=— oo k=— oo 

E E {^(^)-^(^-i)} 



j=~oc k=— oo 

x ^ fc ([ai_iT])V jfe ([ai_ir] + u) + 0(T- 1 ), 

where the 0(T _1 ) term comes from the approximation (2.2). Now, replace 
k by k + [ajT] in the first sum and by k + [a,,_iT] in the second one. The 
main term becomes 

E E | 5 'i(^ + a *) - S i{^ + a i'^j +S j {a i -i)-S j (a i )^ jk (ti)ip jk (u). 

j — oo k — — oo 

Consequently, using the Cauchy-Schwarz inequality and Definition 1, 
Y{A-T(ai,u) - A T (ai-i,u)} 

i=l 

— 1 oo 

< 2 E l j E i^WifcHi + oUT- 1 ) 

j'=— log 2 T k=— oo 

< 2p + K, 
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where K is a constant (because / < T), leading to TV(A^(-,u)) < 2p + K, 
uniformly in u. We can now bound Resty in (B.26) as follows: 

CO , s — 1 

Rest T = 2 7o |^rr 1 Yu A t(^) E 2 ^ 2 ^(n) 



= ^ E l /T dz[A T (z, u) + A T (±, „) - A T (,, «)} E 2 ^ 

<St I' dzJ2\ A T(z,u)\ 
it Jo V 



■( ^,u) - A T (z + ^,u 



as |^(it)| is uniformly bounded by 1. From Proposition 1, the first term is 
0{\TZT\ ). Using (B.5) and the fact that TV(At(-,u)) is uniformly bounded 
in u, the second term is also 0{\RT\~ l ). 

In (B.26), we now expand cx(s/T,u) using (2.6). By the definition of the 
matrix A, we get 



tr(M^ ;T - M hU , T ) > \KT\- 1 EE^U) E(27o - ^A^)^A 



for T large enough. The lower bound is derived from the definition of 70, 71 
and Assumption 2. The upper bound is derived using ti(M* n . T — Mj^-x) < 
tr(M*^. T ) from (B.25), Assumption 1 and the fact that < 1 uni- 

formly in £ < and u E Z. 

2. We prove (B.23). The first inequality is (B.l). From (B.4), we write 
\\M* n . T - Mj,n;T^ < ^ 1/2 \\i P e C \\Ul n . T ~ Uj^tWI Then - usin § Lemma 
B.2, (A.5) and y/C^ < 2-( e+m V 2 C-i, 

^ H^^tHI + ll^'^Tlli 

<4 7o 2 |^r 2 T- 1 E 2^ +m ^ 2 A im + Kl^\R\- 2 T- 1 

m,£=- log 2 T 

< 4£_ l7 2 |^r 2 T- 1 log 2 (T) + 0{T- 1 ). 

The result follows from Lemma B.3. 

3. We prove (B.24). For T large enough, is strictly positive. Using Propo- 
sition B.l and defining p\ = Var(Z! T (M* n . T - M jt1l . T )Z_ T ) = 2\\M* n . T - 
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M 3,K;t\\2 and Qt = \\ M j,Ti;T ~ M jJI;t\\ spec, we can write 
Pr(Z T (M*^. T - M hU . T )Z T > A T ) 

. / 1 (trM^ ;T ) 2 

< eX P( -n — ,2, 



2 ^log2T + 2( ?T tr(M^. T )log 2 r 
(B.23) gives the rates for pj- and gy, leading to the result. □ 

Proof of Proposition 6. By Proposition 2, we have 9t = Qj,n — 
Qj,Hi + O(2i/ 2 /{VT(\K \ A \Hi\)}). This shows that the sign of 8 T is deter- 
mined by the sign of (Qj,iz — Qj,TZi ) for T large enough. We then consider 
the two cases 8t > and 8t < 0. 

If 9 T > 0, define fi T = E(Qj,n ;T - Qj,lZ;T) > and At = tr(Af*^ o;T - 
Mj n . T ) — /xt(1 — l/fo§2^ 1 )i where the matrices M* are defined as in Lem- 
ma B.9. Define the random set V T = {Z! T { M ln Q ;T ~ M j,K;T ~ M j,K ;T + 
Mj t -ji-T)Z_T — At} 5 where Z_j> = (Z\, . . . ,Zt)' is a vector of i.i.d. Gaussian 
random variables. As for the derivation of (B.24), we can use Proposition 
B.l to derive 



log|TV|^o| 2 

Using decomposition (B.7) and by conditioning on Vt, 
Pt(1Z is not rejected | Vt) 

< Pr{Z T (M*^ o;T - M* n . T )Z_ T + qj t Ti ]T ~ 1j,TZ;T 

< 2r](a jt n ,T + cr j: n,T)kT + AtI^t}- 

Note that the first inequality of Proposition B.l implies that Pr{|g^ o . r — 

Ijll-rl > ( a j,Tlo\T + crj t Ti-T)XkT} < 2exp(— \ 2 k T /2). Therefore, by the defini- 
tion of r), 

Pr(7£ is not rejected!"^) 

< 0(T~ V ) + Pr{Z T (M^ o;T - M* n . T )Z T 

< v(^j,n ,T + crj,n,T)kT + At|7>t}- 

Lemma B.8 can now be used to bound this probability because M*^ o . T — 
Mjn-T i s a positive definite matrix and ^(t T j,7£ ,T + c r j,7£,T)^T + At < tr(M*^ o . T ■ 

Mjiz-r) fo r ^ large enough. This leads to the rate 0(— ^rf( JIT^ 7 ^ l7^p) _1 )' 
If 9t < 0, then we apply the same reasoning with fix = ~E(Qj,ii 1 ;T — Qj,7i;T) 
and A T = tr(M^ i;T - M* n . T ) + ^ T (l - l/log 2 T). The result follows after 
the addition of all terms. □ 
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